Datset and Preliminary Analysis

Electrical power plants need a fault detection system in minimum possible time for both equipment protection and remaining stable. For this, a power system is designed using MATLAB by robotics engineers, where the circuit is simulated under normal and faulty conditions, results of which has been recorded to a dataset. The dataset can be found at: https://www.kaggle.com/esathyaprakash/electrical-fault-detection-and-classification?select=detect_dataset.csv. [1]

The following table shows the simulation instances where voltage and current values for sources A, B, and C are measured as Va, Vb, Vc, Ia, Ib, and Ic respectively. And the output field indicates whether the system is faulty. As can be inferred, this is a binary classification, ie, the system is either faulty (indicated by output = 1) or not (indicated by output = 0).

head(data)
##   Output..S.         Ia         Ib        Ic        Va         Vb        Vc
## 1          0 -170.47220  9.2196135 161.25258 0.0544900 -0.6599209 0.6054309
## 2          0 -122.23575  6.1686674 116.06709 0.1020000 -0.6286115 0.5262016
## 3          0  -90.16147  3.8136322  86.34784 0.1410255 -0.6052769 0.4642513
## 4          0  -79.90492  2.3988035  77.50611 0.1562725 -0.6022353 0.4459629
## 5          0  -63.88525  0.5906674  63.29459 0.1804515 -0.5915014 0.4110499
## 6          0  -55.95468 -1.0018817  56.95656 0.1934141 -0.5906954 0.3972813
faulty <- subset(data, data$Output..S. == 1)
no_fault <- subset(data, data$Output..S. == 0)
cat("The number of instances with no fault: ", length(no_fault$Output..S.))
## The number of instances with no fault:  6505
cat("The number of instances with fault: ", length(faulty$Output..S.))
## The number of instances with fault:  5496

The following plots show the effect of change in inputs (“Ia”, “Ib”, “Ic”, “Va”, “Vb”, “Vc”) on the output (faultiness).

Preliminary Analysis Conclusion

For faultiness, only by observing the voltage values, it is hard to say something, as all cases with non-faulty outputs have a wider range of voltages, containing that of the faulty ones. Yet, the measured voltage values falling to the narrow range of input with faulty outcomes, although it overlaps with the one with the non-faulty output, can be considered as the “risky” ones. The range of current values with non-faulty outputs is far more narrow than that of the faulty ones. That is, the current values falling to these areas, not covered by current values with non-faulty outputs, are likely to be faulty.

Principal Component Analysis on Electrical-Fault-Detection Dataset

The following code snippet is to produce the principal components in R.

my_pc <- prcomp(data[,2:7], scale = TRUE)
print(my_pc)
## Standard deviations (1, .., p=6):
## [1] 1.308099e+00 1.215088e+00 1.193685e+00 1.098293e+00 4.258003e-01
## [6] 6.153101e-06
## 
## Rotation (n x k) = (6 x 6):
##           PC1         PC2         PC3        PC4          PC5           PC6
## Ia -0.3214696  0.05631243  0.19291470 -0.7715964 -0.510792617  5.430812e-08
## Ib  0.4815361  0.09135445  0.51824293  0.3409713 -0.612323584  2.485886e-08
## Ic -0.3709766 -0.17574352 -0.57681925  0.3881147 -0.590031399  2.355630e-08
## Va -0.1526212  0.79767914 -0.07124924  0.1056383 -0.002491956 -5.693616e-01
## Vb  0.5660423 -0.30660070 -0.38038483 -0.2956138 -0.087155015 -5.881462e-01
## Vc -0.4283216 -0.47676147  0.46012896  0.1979837  0.091714084 -5.743791e-01
summary(my_pc)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5       PC6
## Standard deviation     1.3081 1.2151 1.1937 1.0983 0.42580 6.153e-06
## Proportion of Variance 0.2852 0.2461 0.2375 0.2010 0.03022 0.000e+00
## Cumulative Proportion  0.2852 0.5313 0.7687 0.9698 1.00000 1.000e+00

One can yield from the previous table, showing the weights of the real features to the linear combination to produce the principal components, that most of the weights come from the voltage values, ie, Vb, Va, and Va for PC1, PC2 and PC3 respectively. This seems convenient with the comment made in the conclusion in the first step, ie, the voltage values measured correspond to a greater variance.

The following table and graph show the “importance” of the components, i.e., how informative they are to represent the majority of the instances.

summary(my_pc)
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5       PC6
## Standard deviation     1.3081 1.2151 1.1937 1.0983 0.42580 6.153e-06
## Proportion of Variance 0.2852 0.2461 0.2375 0.2010 0.03022 0.000e+00
## Cumulative Proportion  0.2852 0.5313 0.7687 0.9698 1.00000 1.000e+00
screeplot(my_pc)

As can be seen from the preceding graph, the first four components are the most “important” ones, as they carry much of the proportion of variance. And hence, it seems, the first two components do not suffice to represent the whole data. That’s why, I do not expect to see distinct clusters when visualized in 2d. In fact, the first four components seem to carry the most of the information; yet, for visualization, I will only be able to try with 2D and 3D.

The following figure is the distribution of data points based on the first two principal components.

ggplot2::autoplot(my_pc, data = data, colour = 'Output..S.')

Although I was not expecting to see distinct clusters in 2D, ie, by using only the first two principal components, it seems they do form clusters. One can observe that the non-faulty ones (represented by dots plotted in darker blue) and the rest seem to form clusters in 3d although the graph is in 2d.

The graph seems that it will become more meaningful if plotted in 3d. The interactive 3d plot can be seen below.

PC <- (my_pc$x)
PC1 <- c(PC[,1])
PC2 <- c(PC[,2])
PC3 <- c(PC[,3])
library(rgl)
cols = factor(data$Output..S., labels = c("red", "black"))
plot3d(x=PC1, y=PC2, z=PC3, col=cols)
legend3d("topright", legend = paste('Type: ', c('Non-Faulty', 'Faulty')), pch = 16, col=c("red", "black"), cex=1, inset=c(0.02))
rglwidget()

As can be seen from the plots (both in 2d and 3d), the non-faulty ones (red dots) form a cluster much more distinctly. This is also convenient with the previous discussion, so the evidence to support it strengthens.

Multi-dimensional Scaling on Electrical-Fault-Detection Dataset

To apply MDS, we first need a distance/proximity/(dis)similarity matrix, computed as follows:

(Due to memory constraints, I had to sample the dataset by a ratio of 1:10).

data_sampled <- data[sample(1:nrow(data), 1000), ]
dist_n <- dist(data_sampled, method = 'euclidean')

Based on the previous analysis, I plotted the rest of the graphs in 3D for meaningful visualization purposes.

MDS transforms the dissimilarity matrix to disparities to maximize the Kruskal’s stress function, ie, to optimally scale the dissimilarity values. There are multiple transformation functions provided in the smacof package. [2] Here, I applied three of them; namely, Ratio MDS, Interval MDS, and Monotone spline MDS, all of which provide a different levels of flexibility for transformation of dissimilarity to scaled distance.

Ratio MDS

cols = factor(data_sampled$Output..S., labels = c("red", "black"))
mds3 = mds(delta = dist_n, ndim = 3, type = "ratio")
library(rgl)
plot3d(mds3$conf, col = cols)
legend3d("topright", legend = paste('Type: ', c('Non-Faulty', 'Faulty')), pch = 16, col=c("red", "black"), cex=1, inset=c(0.02))
rglwidget()

Interval MDS

mds3 = mds(delta = dist_n, ndim = 3, type = "interval")
library(rgl)
plot3d(mds3$conf, col = cols)
legend3d("topright", legend = paste('Type: ', c('Non-Faulty', 'Faulty')), pch = 16, col=c("red", "black"), cex=1, inset=c(0.02))
rglwidget()

Monotone spline MDS

mds3 = mds(delta = dist_n, ndim = 3, type = "mspline")
library(rgl)
plot3d(mds3$conf, col = cols)
legend3d("topright", legend = paste('Type: ', c('Non-Faulty', 'Faulty')), pch = 16, col=c("red", "black"), cex=1, inset=c(0.02))
rglwidget()

Although sampled, ie, not all data is included, the data seem to form clusters well. Furthermore, the plotted graphs seem all convenient within themselves, and the previous PCA results.

All MDS methods results in more or less the same; that’s why, it strengthens the conclusions arrived at previous steps, ie, the measurements for non-faulty systems form a more strict cluster.

ISOMAP on Dataset

Finding the optimal K for ISOMAP using KNN Algorithm

normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

data_sampled <- data[sample(1:nrow(data), 1000), ]
true_labels <- factor(data_sampled$Output..S.)

data_normalized <- normalize(data_sampled[,2:ncol(data_sampled)])

set.seed(123)  # To get the same random sample
dat.d <- sample(1:nrow(data_normalized),size=nrow(data_normalized)*0.7,replace = FALSE) #random selection of 70% data.

train.data <- data_sampled[dat.d,] # 70% training data
test.data <- data_sampled[-dat.d,] # remaining 30% test data

#Now creating separate dataframe for 'Creditability' feature which is our target.
train.labels <- data_sampled[dat.d,1]
test.labels  <- data_sampled[-dat.d,1]



library(class)
## Warning: package 'class' was built under R version 4.1.2
# find the optimal k

i=1                          # declaration to initiate for loop
k.optm=1                     # declaration to initiate for loop
for (i in 1:100){
  knn.mod <-  knn(train=train.data, test=test.data, cl=train.labels, k=i)
  k.optm[i] <- 100 * sum(test.labels == knn.mod)/NROW(test.labels)
  k=i
}

plot(k.optm, type="b", xlab="K- Value",ylab="Accuracy level")

Although k=1 gives the best performance, choosing so small k is known to lead to overfitting. That’s why k=10 is chosen.

ISOMAP with k = 10

data_sampled <- data[sample(1:nrow(data), 100), ]
dist <- dist(data_sampled, method = 'euclidean')

dom <- factor(data_sampled$Output..S.)
iso <- isomap(dist, ndim=3, k=10)
rgl.isomap(iso, col = as.numeric(dom))
## Warning in ordirgl(x, ...): set isometric aspect ratio, previously was 0.855,
## 0.865, 1.837

HIERARCHICAL CLUSTERING

There are mainly two methods for hierarchical clustering, namely, top-down and bottom-up approaches. For the first one, we form a huge cluster to contain all data initially and then divide into clusters; whereas, for the second one, we have clusters for each data point and then merge them to bigger clusters.

AGGLOMERATIVE (BOTTOM-UP) APPROACH

There are different methods to define the inter-cluster and intra-cluster distances. The ones listed below are the ones used:

  1. Max Link Method: We choose the maximum distance between all pairs of data points from each cluster.

  2. Min Link Method: We choose the minimum distance between all pairs of data points from each cluster.

  3. Mean Method: Distance is taken to be the average of all pairs of distances between clusters.

  4. Ward’s Minimum Variance Method: Minimizes the total within-cluster-distance (intra-cluster distance)

The dataset is subsampled so that the tree to show the hierarchical clustering and the true labels are visible, and to keep the run time feasible. Below one can see the true labels and the clusters into which they are assigned, so they have an understanding of the clustering performance.

normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}
data_sampled <- data[sample(1:nrow(data), 100), ]
data_normalized <- normalize(data_sampled)
dist_mat_n <- dist(data_normalized, method = 'euclidean')

Mean Method:

hclust_avg <- hclust(dist_mat_n, method = 'average')
hclust_avg$labels <- data_sampled$Output..S.
plot(hclust_avg, cex = 0.6)
rect.hclust(hclust_avg, k = 2, border = 2:5)

DIVISIVE (TOP-DOWN) APPROACH

library(cluster)
dv <- diana(data_sampled)
dv$order.lab <- data_sampled$Output..S.
pltree(dv, cex = 0.6, hang = -1, main = "Dendrogram of diana")
rect.hclust(dv, k = 2, border = 2:10)

Assessment of the hierarchical clustering algorithms applied

In the above graphs, it seems ward link method and the max link method outperforms the others. That is, the others can be said to failed to cluster distinctively or be more susceptible to outliers, for instance, there is a cluster containing only one instance in the min link method. That’s why, I will move on with the Ward-link and the max-link methods.

And, as I already knew that is a binary classification problem, I choose k=2, ie, the second level from the root, for where to cut the tree from.

For inclusivity of the whole dataset, I subsampled it several times and run the algorithm on these instances.

The following code snippet is to visualize the clusters formed using the max link method:

K-MEANS CLUSTERING

As this is a binary classification problem, there are supposed to be 2 clusters; hence, k is initially chosen to be 2. Moreover, this algorithm starts with random initial clusters, and although it is guaranteed that it will converge, what it converges to might be the local minimum, not the global. That’s why, multiple times the same algorithm is run with different initial point configurations, and at the end of each run its performance is evaluated. And, finally, the initial configuration producing the best performance, ie, minimizing of the objective function, is chosen. The nstart parameter given as argument to the k-means function is that number of iterations tried to find the best initial configuration.

Moreover, subsampling has been applied here, as well, for the same reasons stated above.

2-Means Clustering

set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 100), ]
k2 <- kmeans(data_sampled1, centers = 2, nstart = 50)
fviz_cluster(k2, data = data_sampled1)

Re-Subsampling for 2-Means Clustering

set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 100), ]
k2 <- kmeans(data_sampled1, centers = 2, nstart = 50)
fviz_cluster(k2, data = data_sampled1)

Elbow Method

Normally k is a hyper-parameter that needs to be provided, and when it is unknown, it is something tricky to find. For this, one of the common methods to find the correct k value is the elbow method in which we choose the k where the elbow occurs on the graph. The elbow choice is somewhat ambiguous, as at each k, the graph decreases. That is, as the number of k grows larger, the total loss has to decrease, however, in the extreme case, we would form clusters for each element, which is obviously not meaningful. That’s why, we try to find the k value where the elbow is visible in the graph, ie, the loss has decreased drastically. The following graph shows the loss vs number of clusters:

I used multiple subsampling to obtain the graphs to show the optimal number of clusters:

set.seed(123)
fviz_nbclust(data_sampled1, kmeans, method = "wss")

Subsampling
set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 100), ]
fviz_nbclust(data_sampled1, kmeans, method = "wss")

Subsampling
set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 100), ]
fviz_nbclust(data_sampled1, kmeans, method = "wss")

After subsampled multiple times, all of the graphs obtained above seem consistent.

The elbow choice seem to be meaningful on the 6th cluster. And, k = 6 might actually be a meaningful choice, as this classification further goes down from a binary classification, ie, determining whether the system is faulty, to a multi-class classification, ie, determining where the fault is. In fact, I explained these two problems defined on this dataset in the second assignment, preliminary analysis. However, I moved on with the binary classification problem. The following excerpt, taken from my second assignment, defines the problem for multi-class classification problem:

"After detecting there is a fault, the main problem is to identify where it is. There are 3 phases (A, B, and C), and the Ground (Gnd) in the system whose intervals are where the problem might occur. Hence there 6 different possibilities (there are 3 intervals) regarding where the error is. More explicitly:

[G C B A]

[0 0 0 0] - No Fault

[1 0 0 1] - LG fault (Between Phase A and Gnd)

[0 0 1 1] - LL fault (Between Phase A and Phase B)

[1 0 1 1] - LLG Fault (Between Phases A,B and ground)

[0 1 1 1] - LLL Fault(Between all three phases)

[1 1 1 1] - LLLG fault( Three phase symmetrical fault)"

As can be seen there are six possible cases, and hence, this is a 6-class classification problem.

Choosing different k values

k = 3

k = 3 seems to be a candidate for an elbow, so it is my next choice for k.

k2 <- kmeans(data_sampled, centers = 3, nstart = 50)
fviz_cluster(k2, data = data_sampled)

k = 6

As explained above, one of the problems defined for this dataset is a 6-class classification problem, and it seems feasible to choose k=6 also due to the elbow method, that’s why, next choice for k is k = 6.

k2 <- kmeans(data_sampled, centers = 6, nstart = 50)
fviz_cluster(k2, data = data_sampled)

Although these clusters formed are not distinctly separated in 2D, they are insightful in both k = 2, and k = 6 choices.

These clustering graphs also seem that they are more meaningful in a 3D space, however, I was not able to visualize them in 3D. Yet, the 3D-like structure they all form suggest a consistency with what are obtained with the PCA and MDS results.

DBSCAN Clustering

Library and Subsampling

library(dbscan)
## Warning: package 'dbscan' was built under R version 4.1.2
set.seed(123)
data_sampled2 <- data[sample(1:nrow(data), 1000), ] #sub sampling

Different Configurations

R documentation suggests using minPts as “equal to the dimensionality of the data plus one or higher”. There are 6 input features so 6-7 is suggested. However in my manually done parameter search I found 4 to be the best option which is not far from 6. Document suggests using “kNNdistplot()” for choosing “eps” value. A sudden increase of the kNN distance suggests that the right of the sudden increase is most likely to be outliers. Inspecting the below graph the sudden incease is between 0 and 70. After searching the 0-70 range manually, the best value found is for “eps” is 15. DBSCAN documantation is in this link https://www.rdocumentation.org/packages/dbscan/versions/1.1-8/topics/dbscan

minPtsParam = 7
kNNdistplot(data_sampled2, minPtsParam)

### Eps: 35, MinPts:4

cluster_dbscanChosen <- dbscan(data_sampled2, eps = 15, minPts = 7)
fviz_cluster(cluster_dbscanChosen, data = data_sampled2)

Eps: 50, MinPts:4

cluster_dbscan <- dbscan(data_sampled2, eps = 50, minPts = minPtsParam)
fviz_cluster(cluster_dbscan, data = data_sampled2)

Eps: 250, MinPts:4

cluster_dbscan <- dbscan(data_sampled2, eps = 70, minPts = minPtsParam)
fviz_cluster(cluster_dbscan, data = data_sampled2)

Evaluation of Clusters

External Evaluation using Rand Index

I will move on with the “best” clusters presented above for external evaluation, ie, Hierarchical with Ward’s Link (cut at level=2), and 2-Means Clustering.

Rand Index Evaluation on 2-Means Clustering Method

k2 <- kmeans(data, centers = 2, nstart = 50)
res1 = external_validation(as.numeric(true_labels), k2$cluster, method = "adjusted_rand_index")
res1
## [1] 0.2393556

Rand index for 2-Means Clustering is acutely lower than that of Ward’s Link Method. I think this analysis suffices to say that Ward’s Link method outperformed 2-Means Clustering for this data, however, further comparison and analysis will be shown.

Internal and Relative Evaluation

Evaluating Hierarchical Clustering Methods

First, I will evaluate the clustering with 2 and 6 clusters, as that is what is meaningful for the data.

set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 1000), ]
intern <- clValid(data_sampled, nClust = c(2, 6), clMethods = c("hierarchical"), validation = "internal", method="ward")
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
summary(intern)
## 
## Clustering Methods:
##  hierarchical 
## 
## Cluster sizes:
##  2 6 
## 
## Validation Measures:
##                                  2       6
##                                           
## hierarchical Connectivity   4.1944 31.8734
##              Dunn           0.1695  0.2067
##              Silhouette     0.5047  0.6007
## 
## Optimal Scores:
## 
##              Score  Method       Clusters
## Connectivity 4.1944 hierarchical 2       
## Dunn         0.2067 hierarchical 6       
## Silhouette   0.6007 hierarchical 6

For the sake of inclusivity of all data, subsampling again:

set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 1000), ]
intern <- clValid(data_sampled, nClust = c(2, 6), clMethods = c("hierarchical"), validation = "internal", method="ward")
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
summary(intern)
## 
## Clustering Methods:
##  hierarchical 
## 
## Cluster sizes:
##  2 6 
## 
## Validation Measures:
##                                  2       6
##                                           
## hierarchical Connectivity   4.1944 31.8734
##              Dunn           0.1695  0.2067
##              Silhouette     0.5047  0.6007
## 
## Optimal Scores:
## 
##              Score  Method       Clusters
## Connectivity 4.1944 hierarchical 2       
## Dunn         0.2067 hierarchical 6       
## Silhouette   0.6007 hierarchical 6

As can be seen from the summary of clustering evaluation, one is not dominantly “superior to” the other, which again makes sense considering the semantics behind data, ie, there are both binary and 6-class classification problems defined over the dataset.

Comparing Hierarchical and K-Means Clustering Methods

set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 1000), ]
clmethods <- c("hierarchical","kmeans")
intern <- clValid(data_sampled, nClust = c(2, 6), clMethods = clmethods, validation = "internal", method="ward")
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
summary(intern)
## 
## Clustering Methods:
##  hierarchical kmeans 
## 
## Cluster sizes:
##  2 6 
## 
## Validation Measures:
##                                  2       6
##                                           
## hierarchical Connectivity   4.1944 31.8734
##              Dunn           0.1695  0.2067
##              Silhouette     0.5047  0.6007
## kmeans       Connectivity   8.8020 26.6619
##              Dunn           0.1304  0.1562
##              Silhouette     0.5006  0.6305
## 
## Optimal Scores:
## 
##              Score  Method       Clusters
## Connectivity 4.1944 hierarchical 2       
## Dunn         0.2067 hierarchical 6       
## Silhouette   0.6305 kmeans       6

Subsampling again
set.seed(123)
data_sampled1 <- data[sample(1:nrow(data), 1000), ]
clmethods <- c("hierarchical","kmeans")
intern <- clValid(data_sampled, nClust = c(2, 6), clMethods = clmethods, validation = "internal", method="ward")
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
## The "ward" method has been renamed to "ward.D"; note new "ward.D2"
summary(intern)
## 
## Clustering Methods:
##  hierarchical kmeans 
## 
## Cluster sizes:
##  2 6 
## 
## Validation Measures:
##                                  2       6
##                                           
## hierarchical Connectivity   4.1944 31.8734
##              Dunn           0.1695  0.2067
##              Silhouette     0.5047  0.6007
## kmeans       Connectivity   8.8020 26.6619
##              Dunn           0.1304  0.1562
##              Silhouette     0.5006  0.6305
## 
## Optimal Scores:
## 
##              Score  Method       Clusters
## Connectivity 4.1944 hierarchical 2       
## Dunn         0.2067 hierarchical 6       
## Silhouette   0.6305 kmeans       6

Connectivity should be minimized, whereas, the others should be maximized. The graph for connectivity suggests, more number of clusters yield poorer clustering and it is almost the same for both hierarchical and k-means clustering.

The more number of clusters seem to yield better clustering for both K-means and hierarchical clustering according to Dune Index graph. Also, as the number of clusters is 6, it seems that the performance of hierarchical and k-means are almost the same, which might suggest that 6 is a feasible choice for the number of clusters, which is consistent with the defined problem and previous analyses.

Analysis on these graphs does not yield obvious results regarding which one outperfomed the other. It might be said that k-means is better (not dominantly) using the internal indices. However, this is inconsistent with what is discussed in Rand Index evaluation.

Validation of DBSCAN

clValid function does not accept DBSCAN as a method, thus silhouette index found seperately. Average silhouette value is 0.3. The range for Silhouette Index is between -1 and 1. The found result is 0.3 which is higher than 0 which makes this result acceptable. Silhouette index combines Dunn index which evaluates inter cluster distance and Davies-Bouldin Index which evaluates intra cluster distance. So this index by itself is a satisfying metric.

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
sil.dbscan <- silhouette(cluster_dbscanChosen$cluster, dist(data_sampled2))
library(RColorBrewer)
clusterColours <- brewer.pal(9,"Set1")
plot(sil.dbscan, border=NA, col=clusterColours[sort(clusters)], main="")

Analysis and Evaluation of Results

Using external metrics to evaluate the clusterings, which is more viable, we can conclude that hierarchical clustering with Ward’s Link method clearly outperforms the other metrics. However, the internal indices, although do not suggest directly the opposite, might have been misleading if we did not have the true labels, ie, ground truth.

CONCLUSION

As a result of convenient results of all steps so far, it seems that data forms clusters. That is, upon the arrival of a new data instance, one can infer by projecting that instance point to the dimensions forming the plots and assess whether it falls into the “possibly-faulty” category and behave accordingly, which was the whole purpose from the beginning.

REFERENCES

[1] Prakash, E. S. (2021, May 22). Electrical fault detection and classification. Kaggle. Retrieved November 30, 2021, from https://www.kaggle.com/esathyaprakash/electrical-fault-detection-and-classification?select=detect_dataset.csv.

[2] Comprehensive R Archive Network (CRAN). (2021, July 20). Multidimensional scaling [R package smacof version 2.1-3]. The Comprehensive R Archive Network. Retrieved November 30, 2021, from https://cran.r-project.org/web/packages/smacof/.